REINFORCEMENT LEARNING IN THE JOINT SPACE: VALUE ITERATION IN WORLDS WITH CONTINUOUS STATES AND ACTIONS by
نویسندگان
چکیده
REINFORCEMENT LEARNING IN THE JOINT SPACE: VALUE ITERATION IN WORLDS WITH CONTINUOUS STATES AND ACTIONS Christopher Kenneth Monson Department of Computer Science Master of Science Continuous space reinforcement learning algorithms frequently fail to address the possibility of a continuous action space, presumably because of the difficulty of discovering the best action for a particular state. This can, in some cases, severely limit the ability of a learning algorithm to tackle some common problems where different portions of the state space require distinct action granularity. Näıve action discretization does not suffice for problems of this nature, so traditional reinforcement approaches that consider only the continuous state space fail to solve these kinds of problems. JoSTLe (Joint Space Triangulation Learner) addresses the need for a reinforcement learning approach that can handle a continuous action space by means of intelligent discretization. It employs the variable resolution discretization techniques of Muños and Moore [MM02], but in an augmented “joint” space, one that includes actions as well as states. The algorithm is shown to work on a problem that requires the treatment of a continuous action space, as well as one that does not. The efficacy of the algorithm as well as its sensitivity to parameter tuning are shown through mathematical arguments and experimental data.
منابع مشابه
Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Controlled Trial Analysis
We introduce new, efficient algorithms for value iteration with multiple reward functions and continuous state. We also give an algorithm for finding the set of all nondominated actions in the continuous state setting. This novel extension is appropriate for environments with continuous or finely discretized states where generalization is required, as is the case for data analysis of randomized...
متن کاملLearning control under uncertainty: A probabilistic Value-Iteration approach
In this paper, we introduce a probabilistic version of the wellstudied Value-Iteration approach, i.e. Probabilistic Value-Iteration (PVI). The PVI approach can handle continuous states and actions in an episodic Reinforcement Learning (RL) setting, while using Gaussian Processes to model the state uncertainties. We further show, how the approach can be efficiently realized making it suitable fo...
متن کاملContinuous-State Reinforcement Learning with Fuzzy Approximation
Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. There exist several convergent and consistent RL algorithms which have been intensively studied. In their original form, these algorithms require that the environment states and agent actions take values in a relatively small discrete set. Fuzzy representations for approximate, model-free RL have been proposed i...
متن کاملReinforcement Using Supervised Learning for Policy Generalization
Applying reinforcement learning in large Markov Decision Process (MDP) is an important issue for solving very large problems. Since the exact resolution is often intractable, many approaches have been proposed to approximate the value function (for example, TD-Gammon (Tesauro 1995)) or to approximate directly the policy by gradient methods (Russell & Norvig 2002). Such approaches provide a poli...
متن کاملModel-Based Reinforcement Learning with Continuous States and Actions
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the tran...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003